## Data Preparation

###  Transcriptomics data
STAID requires the following input data:

- **scRNA-seq data (raw counts)**
  - Must be provided as an `AnnData` object before deconvolution.
  - The data should contain **raw count values** (not normalized or log-transformed).
  - Cell type annotations must be provided in `adata.obs.keys()`, e.g.:
    ```python
    sc_adata.obs['celltype']
    ```

- **Spatial transcriptomics data (raw counts)**
  - Must be provided as an `AnnData` object before deconvolution.
  - The expression matrix should contain **raw count values**.
  - Spatial coordinates (e.g., `spatial`) should be included in `adata.obsm`.

Both datasets should share a common set of genes (overlapping gene symbols), which STAID uses to perform deconvolution.


---

### Example Datasets
The demo spatial transcriptomics data (human breast cancer Visium) are available at https://doi.org/10.5281/zenodo.4739739 
and match human breast cancer scRNA-seq reference datasets are available through the Gene Expression Omnibus under accession number GSE176078. 

For convenience, we also provide a sorted version on Google Drive: [Download from Google Drive](https://drive.google.com/drive/folders/1-GhHslCBIYvNFb1Zs3DmLKVg9JZx1QSP?usp=sharing).